Filling acoustic holes through leveraged uncorellated GMMs for in-set/out-of-set speaker recognition

نویسندگان

  • Jun-Won Suh
  • Pongtep Angkititrakul
  • John H. L. Hansen
چکیده

In this study, the problem of in-set versus out-of-set speaker recognition for limited train/test data is addressed. Since enrollment data is so limited (5 sec), acoustic holes in the speaker phoneme space from training tokens will exist and must be filled. To achieve this, a cohort speaker selection process is developed that possess similar acoustic characteristics. The resulting GMM from common sentences are used to measure the speaker’s acoustic similarity with the Kullback-Leibler (KL) distance. The likelihood ratio scores are employed to measure the speaker similarity when no common sentence structure exists. Gaussian components corresponding to the acoustic holes are harvested from the cohort model. Constructed using a phone recognition simulator with 65% accuracy, a comparison is made with the GMM employing common utterances with the TIMIT corpus. Finally, the combination of Gaussian components corresponding to acoustic holes and the common acoustic space are leveraged to improve overall system performance. The proposed acoustic hole filling algorithm is evaluated using speech from the TIMIT and FISHER corpora with the GMM-UBM as our baseline system. The proposed acoustic hole filling system is shown to improve performance by 25% and 13% over the baseline on TIMIT and FISHER. This advancement is a significant step forward in-set/outof-set speaker recognition with limited train (5 sec) and test material (2-8 sec).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic hole filling for sparse enrollment data using a cohort universal corpus for speaker recognition.

In this study, the problem of sparse enrollment data for in-set versus out-of-set speaker recognition is addressed. The challenge here is that both the training speaker data (5 s) and test material (2~6 s) is of limited test duration. The limited enrollment data result in a sparse acoustic model space for the desired speaker model. The focus of this study is on filling these acoustic holes by h...

متن کامل

Accent- and speaker-specific polyphone decision trees for non-native speech recognition

Acoustic models in state-of-the-art LVCSR systems are typically trained on data from thousands of speakers and then adapted to a speaker using, e.g., various combinations of CMLLR, MLLR and MAP. This adaptation step is particularly important for speakers with accents that are not well represented in the training set. The present study explores how to improve performance on South-Asian-accented ...

متن کامل

Pitch-dependent GMMs for text-independent speaker recognition systems

Gaussian mixture models (GMMs) and ergodic hidden Markov models (HMMs) have been successfully applied to model short-term acoustic vectors for speaker recognition systems. Prosodic features are known to carry information concerning the speaker’s identity and they can be combined with the short-term acoustic vectors in order to increase the performance of the speaker recognition system. In this ...

متن کامل

Speaker recognition by means of acoustic and phonetically informed GMMs

In this work we assess the recently proposed hybrid Deep Neural Network/Gaussian Mixture Model (DNN/GMM) approach for speaker recognition considering the effects of the granularity of the phonetic DNN model, and of the precision of the corresponding GMM models, which will be referred to as the phonetic GMMs. The aim of this work is to better understand the contributions of the phonetic informat...

متن کامل

مدل‌سازی بازشناسی واجی کلمات فارسی

Abstract of spoken word recognition is proposed. This model is particularly concerned with extraction of cues from the signal leading to a specification of a word in terms of bundles of distinctive features, which are assumed to be the building blocks of words. In the model proposed, auditory input is chunked into a set of successive time slices. It is assumed that the derivation of the underly...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008